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Abstract 

We extend the classical coupon collector's problem to one in which two collectors are 
simultaneously and independently seeking collections of d coupons. We find, in finite 
terms, the probability that the two collectors finish at the same trial, and we find, using 
the methods of Gessel-Viennot, the probability that the game has the following "ballot- 
like" character: the two collectors are tied with each other for some initial number of 
steps, and after that the player who first gains the lead remains ahead throughout the 
game. As a by-product we obtain the evaluation in finite terms of certain infinite series 
whose coefficients are powers and products of Stirling numbers of the second kind. 

We study the variant of the original coupon collector's problem in which a single 
collector wants to obtain at least h copies of each coupon. Here we give a simpler 
derivation of results of Newman and Shepp, and extend those results. Finally we 
obtain the distribution of the number of coupons that have been obtained exactly once 
("singletons") at the conclusion of a successful coupon collecting sequence. 
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1 Introduction and results 

The classical coupon collector's problem is the following. Suppose that a breakfast cereal 
manufacturer offers a souvenir ( "coupon" ) hidden in each package of cereal, and there are d 
different kinds of souvenirs altogether. The collector wants to have a complete collection of 
all d souvenirs. What is the probability p{n, d) that exactly n boxes of cereal will have to be 
purchased in order to obtain, for the first time, a complete collection of at least one of each 
of the d kinds of souvenir coupons? 

The answer to that question is well known (e.g., P, p. 132) to be 

where the |^}'s are the Stirling numbers of the second kind. 

We study, in this paper, a number of other aspects of this problem, as well as a general- 
ization of it to a two-player game. 

First, suppose we have two coupon collectors, drawing coupons simultaneously, and each 
seeking to obtain a complete collection of d coupons. We ask for the probability that the 
two games are completed at the same time. The answer is given by ((Zj) below. That answer 
is expressed in finite terms, owing to the closed form evaluation of the ordinary power series 
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generating function for the squares of the Stirhng numbers of the second kind, contained in 



Next we consider the following two-person game. Again two coupon collectors are simul- 
taneously drawing coupons at random. This time we are interested in a ballot-like problem: 
what is the probability that the player who first completed a collection (the winner) was 
never behind (i.e., never had fewer distinct coupons) at any intermediate stage of the play? 
Here we give a complete answer to a slightly easier question, namely the following: what 
is the probability that after an initial segment of play in which the players are tied, one of 
them takes the lead and keeps the lead until the end. The answer is in eq. (j?fj) below, and 
is obtained by the Gessel-Viennot theory of nonintersecting lattice paths. 

In each of these cases the answer can first be written as an infinite series whose coefficients 
involve various products of Stirling numbers. What is interesting, though, is that in all such 
cases we are able to express the answers in finite terms. Indeed, one of our main results here 
is the observation that infinite series whose coefficients involve various powers and products 
of Stirling numbers of the second kind can readily be evaluated in finite terms. 

In section 13 we return to the original collecting problem of obtaining at least one copy of 
each coupon, but now we study the variant of the problem in which a single collector wants 
to obtain at least h > 1 copies of each coupon. We obtain the generating function for 
the probability that exactly n trials are needed, the exact value of the average number of 
trials (jHHj) . and the asymptotic behavior (pSj) of these quantities as n ^ oo. 

Finally, in section 0] we study the number of coupons that have been collected only 
once, at the end of a collection sequence. We find the distribution function (jT7|) for this 
number, and show that the average number of these singletons is just the harmonic number 
Hd = l + l/2 + ... + 1/d. 

2 The two-person collecting competition 
2.1 Simultaneous completion 

We find now the probability of simultaneous completion of two independent coupon collecting 
sequences. Evidently this is. 



which expresses the answer as an infinite sum. We can rewrite this as a finite sum by finding 
a finite expression for the generating function for the squares of the Stirling numbers of the 




(2) 
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second kind, 



ri>k 



n 

k\ 



X 



analogously to the well known generating function for these numbers themselves, 



n\ „ X 



n>k 



k] (l-x)(l-2x)...(l-A;x)' 



The easiest way to do this is via the standard explicit formula for these Stirling numbers. 



VIZ. 



n I 

k\ ^ k\ 



[l<k<n) (4) 



r=l 



where we have written 



It follows that 



n>k I J n>k r,s=l 

= X^Y. Ak,rAt,sY.irSxT~^ 
r,s=\ n>k 

= X ^ 



. „i 1 — rsx 



r,s=l 



Thus for the simultaneous completion probability we obtain, from Q, 

Jl2 d-1 /I /I 

= E 1 _ , (7) 

n>0 " r,s=l ^ cP 

by ©, where the A's given by (jSj). This sequence of probabilities, for d = 1, 2, . . ., begins as 
^ 1 11 9 688877 358555 2730269557627901 146271649897951 



' 3' 70' 91' 9561123' 6330324' 58560931675094420' 3695016639410525' ' 
1, 0.33333.., 0.15714.., 0.098901.., 0.072049.., 0.056640.., 0.046622.., 0.039586.., . . . . 
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2.2 Neck-and-neck, then always ahead 

We encode a sequence of n draws as a path cu with n vertices in the lattice C consisting 
of vertices and edges + + l,j + 1)}, for all i,j > 0. The 

first coordinate of a vertex in the path gives the number of draws, or steps, and the second 
coordinate gives the number of distinct coupons the collector has at that step. Thus u starts 
at (0,0) indicating the collector has coupons at draw 0, proceeds to (1, 1) (the collector 
has 1 coupon after 1 draw) , and ends at {n, d) , n > d (the collector has a complete collection 
at step n). We write u; — (0, 0)a7(n, d), where a; is a path from (1, 1) to {n — l,d — 1), to 
indicate that uo starts at the vertex (0,0), continues with the first vertex (1, 1) in cJ, then 
follows uJ through to (n — l,d — 1), and finally ends with the vertex (n, d). 

We assign a weight of i/d to each horizontal edge + in the lattice C. 

This is the probabihty that at the (j + 1)^* step, the collector draws one of the i distinct 
coupons already collected at step j. We assign a weight oi 1 — i/d to each northeast edge 
(i + 1, J + 1)}. The probability that the collector draws the particular sequence of 
coupons encoded by the path u is given by the product of the weights on the edges of uj. 
We let P{u!) denote this probability. 

Suppose one collector, the winner, collects all d distinct coupons for the first time at 
step n. (At step n — 1 the winner had d — 1 distinct coupons.) Let cui be the lattice path 
which encodes the winner's sequence of draws. Let uj2 encode the other collector's draws. 
We compute the probability p{d) that uji and uj2 are identical until some point at which the 
winner takes the lead and the other collector never catches up. 

To do this, we begin by supposing uji is identical to UJ2 until step k, at which point both 
collectors have di distinct coupons. The argument splits into two cases, namely k < n — 2 
and k = n — 1. In both cases, at step k + 1 the winner collects one additional distinct coupon 
while the other collector does not. After step k, the two paths never intersect again. The 
winner collects all d distinct coupons for the first time at step n. Suppose the other collector 
has d2 distinct coupons at this point. The probabihty we seek is 

00 n— 1 (i— 1 d~l 

pid) = E E E E E ^(^1)^(^2) (8) 

n=d k=l di=l d2=di (a;i,W2) 

where the innermost sum ranges over all pairs {001,002) described above. 

2.3 The case k<n-2 

Write uoi = auJi{n,d), where a denotes a lattice path from (0,0) to {k,di), and aJT denotes 
a path from {k + l,di + 1) to {n — l,d — 1). Similarly, set 002 — CKUJ2, where a is as 
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above, and cJ^ is a path from {k + 1, di) to {n, 6/2). Note that uJi and IJ2 are nonintersecting 
paths in the lattice C In terms of these we have P{uji) = P(a)(l — di/d)P(ZJi){l/d) and 
P{lj2) = P{a){di/d)P{uJ2)- Hence from (jHJ we find for the combined probabihty of all pairs 
if < - 2, 



P{d)k<n- 



00 n—2 d—1 d—1 r / 

EEE E E ^(«) 1 

n=d k=l di=l d2=di [llji,u12) L \ 
00 n—2 d—1 d—1 




PM (^) Pirn 



00 n-2 d-1 d-1 / rl \ /I \ / rl \ 

EEE E 1-7 G) 7 En«)^ E m)m 



(9) 



At this point we have translated a question about coupon collecting into a problem 
involving nonintersecting paths in a lattice. We have set the stage for application of the 
Gessel-Viennot theorem This result concerns pairs of nonintersecting lattice paths with 
no constraints on vertices or edges in the paths. For this reason we have written uji and UJ2 
in terms of uJi and uJ2- 

The theorem refers to an arbitrary set £, which we will take to be the lattice defined 
earlier, and a weight (or valuation) f , which we take to be P. The theorem equates a sum 
of weights of paths with the determinant of a matrix {0'ij)i<i,j<i- The entries of this matrix 
are defined by a^j = Y,u]V{uj), where u ranges over all paths from Ai to Bj. 

The theorem requires that two given sequences, (Ai, . . . , Ai) and (-Bi, i?2, • • • , Bi), of 
vertices in £, the sets 1 < i, j < I, of all paths in C between Ai and Bj, and the weight v 
satisfy both the finiteness and crossing conditions. The finiteness condition requires the set 
of paths in Qij with nonzero weight be finite. The crossing condition requires that paths in 
Qiji and Qi>j, i < i' and j < j', with nonzero weight share a common vertex. Both conditions 
hold for the paths we consider. 



Theorem 1 ( Gessel-Viennot) Suppose C, v, {Ai, A2, . . . , Ai), and {Bi, B2, . . . , Bi) satisfy 
both the finiteness and crossing conditions. Then the determinant of the matrix {0'ij)i<i,j<i 
is the sum of the weights of all configurations of paths (tui, U2, ■ . ■ ,uji) satisfying the following 
two conditions: 

(i) The paths Uk are pairwise nonintersecting, and 

(ii) uok is a path from Aj. to B^ • 
In other words, 

det ({aij]\ .^-^ = v{uJi)v{uj2) ■ ■ ■ v{uji). 

{uj-l,LU2,---,l^l) 
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Application of this theorem to our problem requires the computation of only a 2 x 2 
determinant! Let Ai — {k + 1, di + 1), A2 — {k + l,di), Bi — {n — 1, d — 1), and B2 — (n, o?2)- 
Then 

ail CLl2 



J2 P{uh)P{cJ^) ^ det 
,Jr±., 021 022 

where a^j is the sum J2u> over all paths u) from to Bj. 



(10) 



2.4 Paths from A to B 

In this section, we compute the probability P{uj) of an arbitrary path uo from a vertex 
A = (ai,6i) to a vertex B = (0.2,^2), as well as the sum over all such paths. Such a path 
contains 62 — &i northeast edges + + and (a2 — ai) — (62 — &i) horizontal edges. 

The weights assigned to northeast edges in order from left to right are 1 — ^,1 — . . . , 1 — 
^2^. The weight assigned to a horizontal edge depends its coordinates. Consider the edge 
{{i,j), + This edge indicates the collector has j distinct coupons at step i and draws 

one of the same j coupons at step i + 1. The probability of this (weight of the edge) is ^. 
Thus the probabihty of a path uj from A to S is 

1 {d - ^l)|(^^)ei(^^ ^ ^y, {h2Y'2-H+^ (11) 




where e = (ei, 62, . . . , eb^-bi+i) is an ordered partition, a composition^ of (02 — 0,1) — (^2 — ^1) 
into 62 — 61 + 1 nonncgative integer parts. With this we compute the sum of the probabilities 
of all paths from Aio B. 

oj=A---B e 



d^ 

1 (rf-61)! 
(d - 62)! 




5^(61)^^61 + 1)^^ .. (62)^''2-h+i (12) 



where the sum is over all compositions e = (ei, 62, ... , db^-bi+i) of (^2 — ^i) — (&2 — &i) into 
&2 — &i + 1 nonnegative integer parts. This is the coefficient of 2;("2-«i)-(''2-bi) ^j^g series 
expansion of 

1 

(l-6ix)(l-(6i + l)a;)...(l-62x) 
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so we can find a simpler formula for it by looking at the partial fraction expansion 



n^=,(l-ma;) 
where 

^ _ {-If-'^m''-'' / b - 
™ (6 — a)! \m — aj 

From this and (jl2p we obtain 



™ — -1 1 TflX 







&2)! 


1 


(d- 


&i)! 


(Ja2—ai 


(d- 


&2)! 


1 




&i)! 


-ai 




&2)! 



b2 
r7i=6i 

2.5 Evaluating the determinant 

We use the results of the previous section to evaluate the determinant in (fTUj) . To compute 
ail, we substitute A = Ai = {k + 1, di + 1) and B = Bi = {n — 1, d — 1) in (fT^ . This yields 

1 d-l ( _l\d-m-l n-k~2 / J _ J 2\ 

- ^^i.-..-r,j J >_^^_^,, j (15) 

In a similar manner we obtain 

1 (rf-rfl-1)! ^ (-l)'^^--^n^fe-l 

(^2-^1-1)! l^m-rfi-lj ^ ^ 

1 (rf-c/i)! ^ (-l)'^""m"-fe"Y 4-rfi\ ...^ 



Using (fT3j) - (fTHj) we compute the determinant of our 2x2 matrix. 

det 



Oil ai2 

(3.21 '3'22 



id - d.y.id - d, - 1)\ ^ ^^g^ 



^2n-2fc-3(^_^2)! 



= det((i, (ii, c?2, fc, (20) 



Substituting (plj) in (fTUj) . we obtain 

P{pt)P{pt) = det{d, di, d2, k, n) (21) 

(ajr,<^) 

2.6 The initial common segment 

In the previous section we evaluated the determinant in (fTIUl . In this section we compute 
the sum Z]a-P(tt)^ in ©• Recall a is a path from (0,0) to {k,di). 

Equation gives the probability of an arbitrary path from A to B. Substituting 
A = (0, 0) and B = {k, di) gives the probability 

of an arbitrary path a from (0, 0) to {k, di). It follows that 



a=(0,0)-(fc,di) '^^''('^ '^l)'^ ei+...+ed^=fe-rfi 



ci!^ . .... f 1 



d'^id-diY?' ' \ {l-l^x){l-2^x)...{l-dlx) 

M2 di 

Y C^m''-'''^ (as in (USD) 



!/• m=l 



(d-di)Frf2^(2rfO!S^ ^^'""^U+W'"''' ^^^^ 
init(rf,rfi, A;) (23) 



2.7 The case k = n - 1 

Suppose now that the two walks are identical up to the point {n — l,d — 1). Since step n is 
the finish, the next step for the winning player will be to [n, d), and for the loser, to [n, d—1). 
These last steps have respective probabilities 1/d and 1 — 1/d. Hence the probability of the 
complete pair of walks in this case is the probability of two identical walks from (0, 0) to 
[n — l,d — 1) (which is given by (j22p with [k, di) := [n — l,d— 1)) multiplied by [d — l)/d'^. 
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2.8 Putting it together 

We now substitute and into ^ to obtain the probability of all pairs of paths that 
we are considering, 



Pid) = J2 (l-^) (^) ('^)init(rf,c?i,A;)det(rf,rfi,d2,fc,n) 



oo n~2 d~l d—1 

fl- ^ ^ 

n=d k=l di=l d2=d\ \ d J y d J \d 

d - 1 °° 

+-^Einit(c^,c^-l,n-l), (24) 

n=d 

= Si + S2 (25) 

It turns out that the sums over the indices d2,n,k can all be carried out in explicit closed 
form. Hence we can obtain an expression which is in finite terms for the total probability. 
First, the sum on d2 in Si above can be done in closed form since 

Next, the remaining sum over the indices n and k, in the first summation. Si is 

n^[a,r,S,l) - 2^ 2^ ,2n ~ ] r\st)'i-\r^-d2)+str^''id2-st) ,i • '^^"^ 

n=d k=i " rf2d-2(^2_^j)(^2_^2)(r2_5t)r2s' oxuerwise. 

The sum over n in S2 is trivial, and so there remain no infinite sums in our final expression 
for the probability p{d), which is 

y ^dl^dM-d,) ^_,Y.-r-s-u_^J 2d, \(d-d,-l\(d-dA 
,^,(rf-rfi)!W.4; ^ ^ \d, + r)[ s-d, jU-tj^^"'^'^'^^ 

4(rf-l)rf!^ ( 2d -2 

^d^<^-\2d-2)\}^}~^^ \d-l + r)l^^^^''' 

where is given by (j26p . 

This is the probability that the game is of the type we described, namely where the players 
are tied for some initial segment of trials and then the player who pulls ahead remains ahead 
always, expressed as a finite sum (albeit a complicated one!). More precisely, the values of 
p{d) can be calculated, as rational numbers, with 0{d^) evaluations of the above summand. 
The exact values of p{d), for d = 1, 2, 3, 4, 5, . . . are 

^ 2 43 986 5672893 
'3' 70' 2275' 1912246' 
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As decimals, the values of are 

{1.0, 0.66667, 0.61429, 0.43341, 0.29667, 0.21177, 0.16016, 0.12748, 0.10551, 0.08988} . 

2.9 One collector never behind 

In contrast to the problem of staying ahead as soon as the tie is broken, which we have 
solved in the preceding sections, the problem in which the ultimate winner has never been 
behind is unsolved. 

Suppose the winner collects all d distinct coupons for the first time at step n, at which 
point the other collector has d' < d distinct coupons. We discuss the probability h{d) the 
winner has never been behind. We use h for "ballot" since this version of the problem has a 
distinct ballot-problem flavor (see []). 

Let Wi be the lattice path which encodes the winner's sequence of draws. Let uj2 encode 
the other collector's sequence of draws. Then b{d) is the probability that uj2 does not cross 
Ui. To say uj2 does not cross Ui means for each horizontal coordinate i shared by vertices 
(i, ji) in ui and (i, ^2) in LJ2, we have j2 < ji- In the case j2 = ji, we say ui and uj2 intersect 
at = {i,j2)- Thus we seek all pairs (cc;i,a;2) such that Ui is a path from (0,0) to {n,d) 

including the vertex (n — 1, d — 1), ci;2 is a path from (0, 0) to (n, d') for 1 < d' < d, and U2 
does not cross Ui. Such a pair (ti;i,ci;2) is illustrated by Figure HJ Note that Ui and UJ2 may 
intersect several times. The probability we seek is 

00 1 
n=d d'=l (uji,uJ2) 

where the innermost sum ranges over all pairs described above. 



Look again at Figured A pair {001,002) appears to form a chain of flying kites anchored 
to the ground at (0,0). The highest kite has two ribbons attached to its tip. Their loose 
ends are at {n,d) and {n,d'). 



Each kite consists of a frame together with a tail. See Figure |21 A frame from (h, ji) 
to (^2,^2) consists of a pair of paths from (zi,ji), the lower tip of the frame, to {12,32)^ the 
upper tip, which intersect only at the endpoints. A tail from (ii,ji) to (^2,^2) consists of 
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^^^^ ji) 



(ioJo) 

Figure 2: A kite 
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two identical paths between these endpoints. The length of a tail is the number of vertices 
in the tail minus one, i.e., the number of edges. 

A pair (a;i,ci;2) such that lu2 does not cross Ui forms an alternating sequence of tails and 
frames, beginning with a tail. Note that tails may have length zero. 

The upper tip of the final frame in this sequence is the common endpoint for two paths 
which intersect only at this common endpoint (these are the "ribbons" described above). 
One path ends at {n,d), this is the top ribbon, and the other ends at {n,d'), the bottom 
ribbon. 

Below we compute the probability of a frame from (^i, ji) to (^2,^2), a tail from (ii, ji) to 
(^2,^2), and a pair of ribbons with common initial point {k, d") and terminal points at (n, d) 
and {n,d'), respectively. 

Let f^il'jl){d) denote the probability of a frame from (zi,ji) to {12, j2)- ^'^^^^^'^ f{il'^j^{d) 7^ 
0, we must have ^2 > "^i + 2, j2 > jii and j2 — ji ^2 — h ^ ^- Assuming these conditions, 
we write 

(^) = E Pic^)m (28) 

where (a,/3) is a pair of paths from (ii, ji) to {12,32) intersecting only at the endpoints such 
that j3 does not cross a (i.e., a forms the upper edge of the frame, and (3 forms the lower 
edge) . 

We convert the sum above into a determinant using the Gessel-Viennot thereom. Eval- 
uation of the determinant gives 

We compute the probability ^(i^ j^] {d) of a tail from (zi, ji) to (^2, J2) in a manner analogous 
to the computation of I]Q,-P(a)^ in section ITBl We obtain 



%..0(^)-rf2fe-.0(2,,)!(rf-j2)!^i'/ 1) ^ (^^)- 



"1 \j2 + mj \ ji-m J 

Finally we compute the probability r{d,d',d",k,n) of a pair of ribbons with common 
initial point {k,d") and terminal points {n,d') and {n,d). The probability is given by a 
determinant similar to the one in (fTUI) . In the present case, we have d" in place of di and d' 
in place of d2- Thus 

r{d, d', d", k, n) = det{d, d', d", k, n) 
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3 The "double dixie-cup problem," of Newman and 
Shepp, revisited 

Here we consider a different generalization of the coupon collector's problem. Let integers 
h,d > Ihe fixed. Again we are sampling with replacement from d kinds of coupons, but now 
T is the epoch at which we have collected at least h copies of each of the d coupons, for the 
first time (for example, my h — 1 siblings and I might each want to have our own copy of every 
one of the available baseball cards). We study the expectation, the probability generating 
function, and the asymptotic behavior of the expectation, of this generalized problem. 

These questions were investigated by Newman and Shepp and the asymptotics were 
refined by Erdos-Renyi 0. It is interesting to note that this problem is equivalent to one 
about the evolution of a random graph. Suppose we fix n vertices, and then we begin to 
collect from among n kinds of coupons. If we collect a particular sequence, say, {ci, C2, C3, . . .} 
then we add the edges (ci, C2),(c3, C4) . . .. That is, we add an edge each time we choose a 
new pair of coupons. Our problem about collecting at least h copies of each kind of coupon 
is thereby equivalent to the question of obtaining a minimum degree of at least h in an 
evolving random graph. ^ In this section we will not add anything new to the asymptotics of 
this problem. Instead we claim only a simpler derivation than the original, and an explicit 
generating function, which gives a nice road to the asymptotics. We deal only with generating 
functions in one variable, whereas in |2j multivariate generating functions were used. We 
obtain not only the expectation of the time to reach a collection that has at least h copies 
of each kind of coupon, but also the complete probability distribution of that time. 

For n fixed, consider a sequence of n drawings of coupons that constitutes, for the first 
time at the nth drawing, a complete collection of at least h copies of each of the d kinds of 
coupons. 

There are d possibilities for the coupon that completes the collection on the nth drawing. 
There are (j^Zi) ways to choose the set of earlier drawings on which that last coupon type 
occurred. On the remaining n — h drawings we can define, as usual, an equivalence relation: 
two drawings i,j are equivalent if the same kind of coupon was drawn at the ith and the 
j'th drawings. The number of such equivalence relations is equal to the number of ordered 
partitions of a set of n — /i elements into d — 1 classes, each class containing at least h 
elements. We will denote this latter number by {d — where the {^j^'s count the 

unordered partitions of an n-set into k classes of at least h elements each. 

The number of sequences of n drawings for which we achieve a complete collection for 

^Our thanks to Ed Bender and to a helpful referee for pointing this out. 
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the first time at the nth drawing is therefore 

Since there are d"' possible drawing sequences of length n, the probability that T = n is 
and the probability generating function is 

715 

= d\ 



fn — 




in — 






1) 


\d- 





X 



n>0 y Jh 



where D = d/dx. 

It remains to find the ordinary power series generating function of the {^j^'s- The 
exponential formula gives us immediately their exponential generating function, as 



We can convert this into an ordinary power series generating function by applying the Laplace 
transform operator 

/ e"^" ■■■dx 
Jo 

to both sides, which yields 



or finally 

Ill, - m r i^' - 1 - • - 7^)^^- (32) 



n>0 (. Jh 



kit Jo V (^-1)', 

Now if we substitute ((221) into (jHUj) we obtain the probability generating function of the 
generalized coupon collector's problem in the form 
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In the above, (^^j^^i^ is the differential operator that is defined by 

However, it is easy to estabhsh, by induction on h, the interesting fact that 
Hence we have proved the following evaluation. 

Theorem 2 The probability generating function for the coupon collecting problem in which 
at least h copies of each coupon are needed is given by 

3.1 Two examples 

Let's look at the cases h = 1, the classical case, and h = 2, where we want to collect at least 
two specimens of each of the d kinds of coupons. 
li h = 1 then (j35|) takes the form 

Pi{x) = d e-"^/%e' - ly-^dt. 
Jo 

If we expand the power of (e* — 1) by the binomial theorem and integrate termwise we obtain 

which is precisely the partial fraction expansion of the classical generating function (jH)). 
To see something new, let h = 2. Then 

P^{x) = dj^ te"*^/^' (e* - 1 - t) dt. (36) 

Again, by termwise integration this can be made fairly explicit, but since the most interest 
attaches to the expectation, let's look at the average number of trials that are needed to col- 
lect at least two samples of each of d coupons. This is P2{^), which after some simplification 
takes the form 

P^(l) = d' (1 - (1 + t)e-rdt. (37) 
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From this we can go in either of two directions, an exact evaluation or an asymptotic 
approximation. By termwise integration it is easy to obtain the following exact formula, 
which is a finite sum, for (T)2, the average number of trials needed to collect at least two of 
each of the d kinds of coupons: 

For d = 2,3,4,5 these are 2,11/2,347/36,12259/864. To facilitate comparison with the 
classical {h = 1) case, we show below, for 1 < ci < 10, a table of the expected numbers of 
trials needed when h = 1,2. 



d 

{T)2 



123456789 10 
1.0000 3.0000 5.5000 8.3333 11.417 14.700 18.150 21.743 25.460 29.290 
2.0000 5.5000 9.6389 14.189 19.041 24.134 29.425 34.885 40.492 46.230 



3.2 Asymptotics 

Now we investigate the asymptotic behavior of ()37p . for large d, to compare it with the 
d log d behavior of the classical case where h = 1. 

Theorem 3 // there are d different kinds of coupons, and if at each step we sample one of 
the d kinds with uniform probability, let {T)h denote the average number of samples that we 
must take until, for the first time, we have collected at least h specimens of each of the d 
kinds of coupons. Then for every h > 1, we have {T)h ~ dlogd {d oo). 

Consider first the case h = 2. In (j37|) we make the substitution 

e"" = 1 - (1 + t)e"*, (39) 
where m is a new variable of integration. We then find that 

P'(l) = d^ / t(u)e~'"^du, (40) 
Jo 

where t{u) is the inverse function of the substitution (jH^ . which is well defined since the 
right side of (jH^ increases steadily from to 1 as t increases from to oo. 

The main contribution to -P2(l) comes from values of u near u = 0, and when u is near 
we have 

t{u) = — logM + ©(loglog-u). 
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Following the arguments in [7j, sec. 2.2, we see that ^2(1) PHj) has the same asymptotic 
behavior as ^ 

(f I {-\ogu)e-'"^du, (0 < c< 1) 

JO 

and in [7] this is shown to be 

~ ct ■ — — = a log a. 

a 

Now we consider the asymptotic behavior of the expected number of trials for general 
values of h. From we see that this expected number of trials can be written in the form 

l+t + - + ... + - -TT e-* dt (41) 



Again we make the change of variable 



/ +2 fh-i \ 

= + + (42) 

in the integral, and it takes the remarkably simple form (compare (j4Uj) ) 

Pl^il) = / t{u)e-'"^du, 
Jo 

where t{u) is the inverse function of the substitution (021) • Again the main contribution to 
the integral comes from small values of u, and when u is small and positive we have 

t{u) = — log u + {h — 1) log (— log u) + 

Using the method of sec. II. 2 of T!\ once more, we find that 

(T)/, = rflogd+(/i-l)rfloglogrf(l + o(l)) (rf^oo). (43) 

We remark that in the case oi d = 200 coupons, the correct expected number of trials 
to obtain two of each coupon is 1614 trials, the approximation dlogd is 1175, and the 
approximation dlogd + {h — l)d\o^ogd is 1393, each rounded to the nearest integer. 



4 The number of singletons 

In view of the asymptotics in the preceding section we realize that at the moment when a 
coupon collector sequence terminates with a complete collection, 'most' coupons will have 
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been collected more than once, and only 'a few' will have been collected just once. We call 
a coupon that has been seen just once a singleton. We will now look at the distribution of 
singletons. 

In more detail, let j be the number of singletons in a collecting sequence that terminates 
successfully at the nth step. We first want the joint distribution f{n,j) of n and j, i.e., the 
probability that a collecting sequence halts successfully at the nth step, and has exactly j 
singletons at that moment. We claim that 



Indeed, the last coupon to be collected can be chosen in d ways, the other j — 1 singleton 



coupons can be chosen in [ ■_]) ways, and can be presented in an ordered sequence in 



(j — ways. This ordered sequence can appear among the first n — 1 trials in ("Zj) 

ways, and the remaining n — j trials constitute an ordered partition oi n — j elements into 
d — j classes, no class having fewer than two elements, which can be chosen in {d — 
ways. If we multiply these together and divide by d^, the number of n-sequences, we obtain 
the result claimed above. 

Next we compute the probability that a completed collecting sequence contains exactly j 
singletons, whatever the length of the sequence may be. That is we find F{j) = J2nf{^yj)y 
where / is given by (j44|) . We have, after using the generating function (|32|) . 



{d-j)W [Jo W j-1 J \ t 
But using the fact that, analogously to we have 



(e^' - 1 - xY~^dx \ (46) 
J t~^i/d 




t J {j~l)\P 

we can simplify the expression for F{j) to 



-e 



-x/t 



oo 



m = J \] I ' ^'-\e^ - 1 - xy-^e-^'^dx, U = 1, 2, 3, . . .) (47) 




which is the desired distribution of the number of singletons in a successfully terminated 
coupon collecting sequence. 
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Now if we multiply by j and sum over j, we'll get the average number of singletons that 
appear in a completed collection of d coupons. This is, after some termwise integration, 

' \ m J (m + 2)2(m + l) 

If we expand the summand in partial fractions, viz., 

^d-2\ / 1 1 d-1 \ 



]id)=dJ2i-l] 



m j ym+1 m + 2 (m + 2yj 

then each of the three sums indicated can be expressed in closed form, in two cases by using 
the identity 



directly, with a; = 1 and x = 2, and in the third case by differentiating PH|) w.r.t. x, and 
using the result with x = 2. The identity (j^H|) is itself certified, after multiplying by the 
denominator on the right, by the WZ proof certificate R{n, k) = k{x+k)/{{n + l){k — n — l)). 

What results is that j{d) = H^, the cith harmonic number. That is, the average number 
of singleton coupons in a completed collection sequence of d coupons is the harmonic number 
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